Tested on M1 Max 64GB ComfyUI: Anima-Base v1.0 matches preview3-base in speed; WAI-Anima kana LoRA hits 22% on light prompts but 67% with hood+robe+embroidery added.
Tested Q_rsqrt on Apple M4 (Mac mini) and Zen 3 (Ryzen 5800HS / WSL2). M4's -O2 already rewrites 1/sqrtf to frsqrte and ties Q_rsqrt; x86 clang needs -ffast-math or hits a 12x gap. Hand-written NEON/SSE wrappers turn out slower. Newton 0/1/2 error and the Lomont constant covered too.
oMLX 0.3.9.dev2 release notes from the angle of Codex/Copilot on Mac local LLMs: Gemma 4 VLM MTP, DFlash, omlx launch copilot, SSD KV cache — what each changes for agent workflows.
Tested Klein 9B + 9B NSFW LoRA on M1 Max 64GB via mflux 0.17.5: 1m51s/512, 5m37s/1024 q4, 224/224 LoRA keys match, NSFW prompts uncensored, Japanese subjects work with helper tokens.
Investigated whether NSFW LoRAs for FLUX.2 Klein 9B can run on M1 Max 64GB. Covers model compatibility, LoRA application paths, RunPod verification strategy, and VRAM requirements for training your own LoRA with ai-toolkit.
Three local image generation engines (WAI-Anima, WAI-IL/SDXL, FLUX.2 Klein 4B) tied together by a thin FastAPI wrapper that takes Japanese prompts. Ollama (gemma3:12b) handles JP→EN, ComfyUI workflows are built on the fly in Python, FLUX.2 runs as an mflux subprocess, and the whole thing is reachable from an iPhone over Tailscale.
Hands-on log of building the DEV article's PDF RAG on M1 Max 64GB, extending it with images via CLIP, and pushing through Japanese with bge-m3 + Qwen3.6 35B. Documents the modality gap, the dual inference server crash, and LLM-jp 4-8B's empty chat template silently dropping the system role.
A hands-on log of running Qwen-Scope's Sparse Autoencoder locally on M1 Max 64GB with Qwen3-8B-Base, extracting feature IDs that discriminate between Japanese, English, code, and Chinese from a single middle layer.
Hands-on benchmark of FLUX.2 Klein 4B on M1 Max 64GB using mflux (MLX) and iris.c (pure C + Metal). A counter to Pruna AI's H100-only tutorial — measuring how fast Apple Silicon actually gets there.
After Xiaomi MiMo-V2.5's weights went public, I checked whether it runs on Mac/ROCm or on cloud GPU (RunPod/GCE). It's still rough on local hardware, but RunPod's 4x H200 runs it for ~$14/hr and GCE Spot H100 brings it down to ~$1.6/hr.
Confirmed SeeSee21/Z-Anime is a full fine-tune of Z-Image Base, then ran the AIO version on local ComfyUI on an M1 Max 64GB to verify t2i, i2i, and how NSFW prompts pass through.